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Abstract 

In many situations in numerical analysis least squares loss functions with diagonal 
weight matrices are much easier to minimize than least square loss functions with full 
positive semi-definite weight matrices. We use majorization to replace problems with a 
full weight matrix by a sequence of diagonal weight matrix problems. Diagonal weights 
which optimally approximate the full weights are computed using a simple semi-definite 
programming procedure. 
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Note: This is a working paper which will be expanded/updated frequently. All suggestions for 
improvement are welcome. The directory deleeuwpdx.net/pubfolders/wls has a pdf version, 
the complete Rmd file, the R code, and the bib file. 

1 Introduction 

In this paper we study the problem of minimizing a weighted least- squares loss function 
cr(x) = x'Wx over a set X C M n . Here IT is a positive semi-definite matrix of order n. 

Examples are linear least squares, where X is of the form y — A(3, or isotone regression, where 
X = y — /C with K the cone of isotone vectors, or nonlinear least squares, where the elements 
of X are of the form y — F(9 ) for F : —> W n and 6 G M p . 

The idea is that weighted least squares problems with general IT are more difficult to solve 
than problems with a diagonal IT. We use majorization theory to reduce problems with 
non-diagonal weights to sequences of problems with diagonal weights. General discussions 
of majorization (also known as MM) algorithms are in De Leeuw (2016) and Lange (2016). 
Majorization theory has been used earlier in the context of weighted least squares problems 
by Heiser (1995), Kiers (1997) and Groenen, Giaquint-o, and Kiers (2003), although their 
applications are limited to linear regression and low-rank matrix approximation (also known 
as principal component analysis) with rather simple bounds. 

2 Majorization 

Write x = z + (x — z) and thus 

a(x) = (z + (x — z))'W(z + (x — z)) = cr(z) + 2z'W(x — z) + (x — z)'W(x — z). 

Now suppose we have a diagonal D satisfying D > IT in the Loewner sense, i.e. D — W is 
positive semi-definite. Then a(x) < r](x, z) for all x, z G X, where 

r](x, z) := a(z) + 2z'W(x — z) + (x — z)'D(x — z), 

and, of course, a(x) = r}(x,x) for all x G X. Thus r\ : X ® X —* M is a majorization scheme 
in the sense of De Leeuw (2016), chapter 7. The corresponding majorization algorithm is 

a;( fc+1 ) G Argmin r}(x,x^), 
xex 

and the sandwich inequality is 

cr(x ( ' fe+1 - > ) < 77(a^ fe+1 \ x^) < rj(x^ k \ x^) = a(x^). 

The first- inequality is strict if the minimizer of y(x, x^) is unique, the second is strict if 
D — W is positive definite. 

Define r(z) := (/ — D _1 IT)^. Then, completing the square, minimizing r)(x, z) over x is the 
same as minimizing 

p(x, z) := (x — r(z))'D(x — r(z)) 
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over x. 


Consequently an iteration of the majorization algorithm finds by projecting r^x^) on 

X, using the metric D. 

2.1 Singularity 

It should be mentioned that our majorization method also applies if W is singular. In fact we 
can show that D will be positive definite in all situations of interest. Suppose D is singular, 
i.e. it has zero diagonal elements. Then D > W implies that W also has the same zero 
diagonal elements, and because W is positive semi-definite all off-diagonal elements of W 
corresponding with these diagonal elements are zero as well. If / is the index set for which 
the diagonal elements of W are zero, then we must minimize z' I W I z I and for majorization 
use a positive definite Dj > Wj. Thus majorization can be also be used to regularize singular 
problems. 

If W is indefinite it is still possible to choose a positive definite diagonal D such that 
D — W > 0. We will still have a majorization algorithm that generates a decreasing sequence 
of loss function values, but the loss function may no longer be bounded below, and thus the 
sequence of loss function values may not converge. 


3 Rate of Convergence 


3.1 Linear Least Squares 


For simplicity we start with the linear case. Thus X = {x \ x = y — Af3}. Assume A to be 
of full column rank, and assume D to be positive definite. We can write the majorization 
scheme as 


Thus 


7 ) = < 7 ( 7 ) + 2 (v ~ A-i)'WA{^ - p) + (7 - P)'A!DA( 7 - p). 
7 ) = -2 A'W(y - Ay) - 2 A'DA^ - 0), 


which is zero for 

P = 7 + {XDA)- l AW(y - A'y) = ( XDA)~ l XWy + (/ - (A I DA)~ 1 A , WA) 1 . 


The iterations take the form 

p{k+i) = (A , DA)- 1 A'Wy + (/ - (A , DA)~ 1 A'WA)p {k \ 

and thus 


p(k+i) _ p(k) = (/ _ (A'DA)- 1 A'WA)(p { V - / 3 (fc - 1) ). ( 1 ) 

The speed of convergence of the iteration is determined by the eigenvalues of ( A'DA)~ 1 A'WA , 
which are less than or equal to one. Thus the smaller D is, given that D — W > 0, the faster 
the convergence will be. The linear convergence rate of our majorization algorithm is the 
largest eigenvalue of / — (A'DA)^ 1 A'WA. 
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3.2 Non-linear Least Squares 

For non-linear least squares X = {x \ y — F(6)}. The majorization scheme is 

v(0, 0 = *(0 - 2 (y - F(t))'w(F(e) - f(0) + (f(0) - F(t))'D(F(e) - f(0). 

Minimizing over 0 for fixed £ means solving 

V lV (9 ,0 = -2G(0)'W(y - F(0) + 2G(0)'£>(*’(0) - F(0) = 0 , 

where GO) is short for T>F(0). This implicitly defines 6 as a function of £, and we have, at a 
fixed point where 0(£) = £, 

What remains to be done is computing expressions for the derivatives. We have 

P1217&O = 2G(fl'WG(0 - 2 G(0'BG(0, 

V llV (0,O = 2G(O'DG(O-2H(O, 

where 

n n 


w(() = [G«)'ZX?(0 - ffK)]-‘(GK)'BG(0 - GK)WG(O). (2) 

In the linear case i7(£) =0 and G(£) = A, and we find the result in equation (1). Again we 
see that, in general terms, it will be good to have D as small as possible. Note that if we 
have perfect fit, i.e. if y — F(£), then again H(£) = 0 and 

V0(Q = (/ - [GK)'BGK)]-‘G«)WG(0). 

Both in the linear and nonlinear case the derivatives G(£) and A complicate the comparison 
of different diagonal matrices of weights, but the general idea that we want both D — W > 0 
and D as small as possible seems to be a useful summary of the results so far. 

4 Minimum Trace Majorization Bound 

Let us look at various ways in which we can have a small D, while satisfying D — W > 0.There 
are many ways, of course, to measure how large D is. In the context of linear iterations, 
we have seen that what matters is the size of A’DA relative to A'WA, and something very 
similar is true in the non-linear case. But in this section we will look for diagonal matrices 
D that are small in a more general sense, independent of the value of any derivatives or 
coefficient matrices. 


4 



The set of all scalar matrices D = XI with D > W is easy to describe. We have D > W if and 
only if A > A max (kF), the largest eigenvalue of W. Thus we also have D > W if D = ||W||/, 
with W any matrix norm. Perhaps the easiest bound is D = tr (W)I, although that will tend 
to be really unsharp. 

Another easy bound uses V = diag(W). Now V~^WV~^ is a correlation matrix, which 
means its largest eigenvalue is less than or equal to n. Thus D = nV satisfies D > W. 
In general, however, it seems that requiring D to be scalar, and working with the largest 
eigenvalue, or even bound on the largest eigenvalue, will lead to large D, and consequently 
slow convergence of the majorization algorithm. So we will attempt to do better. For now we 
adopt the trace as a simple measure of the size of D, which does not take specific aspects of 
the least squares problem into account. 

What we try to solve in this section is to compute riling tr D over D — W > 0. Let’s call 
this the minimum trace majorization bound or MTMB problem. Before we start, let us note 
the similarity of the MTMB problem to minimum trace factor analysis (MTFA), in which we 
compute max 75 tr D over W — D > 0 (and D > 0 in the case of constrained MTFA). See 
Watson (1992) and Jamshidian and Bentler (1998) for MTFA algorithms. Both MTFA and 
MTMB are examples of semidefinite programming or SDP (Vandenberghe and Boyd (1996)). 

We first analyse the MTMB problem a bit more in detail. The Lagrangian is 

C(D, R) — tr D — tr R(D - W), 

where i? > 0 is a symmetric matrix of Lagrange multipliers (or dual variables). The 
nececessary and sufficient conditions for a minimum are 


D - W > 0, 

R> o, 


diag (R) = /, 
R(D -W) = 0. 


Because 

ma x£(D,R) = 
r > o 

the primal problem is min D max R > 0 C(D,R). 
The dual function is 


tr D 

+ CX) 


if D - W > 0, 
otherwise, 


min C(D, R) 


tr RW if diag (R) = /, 

—oo otherwise, 


and thus dual probern is to maximize tr RW over R > 0 with diag(i?) = /, i.e. over all 
correlation matrices. Because both the primal and the dual problem are structly feasible, the 
optimal value of both problems are equal. 
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4.1 Extreme Correlation Matrices 

Correlation matrices, i.e. positive semi-definite matrices with a unit diagonal, form a compact 
convex £ nxn set in the space of all symmetric matrices. Laurent and Poljak (f996) call 
this set the elliptope. A linear function on a compact convex set attains its maximum at 
one of the extreme poonts of the set, i.e. one of the points that cannot be represented as a 
convex combination of two other points in the set. Thus the solution of the dual problem of 
maximizing tr RW is an extreme point of the elliptope. This makes it interesting for us to 
look at the structure of such extreme points 

Correlation matrices of rank one are of the form xx' , where a; is a vector with element ±f. 
Such matrices are called cut matrices in Laurent and Poljak (1996), and their convex hull is 
the cut polytope. The max-cut problem is maximizing a linear function over the cut polytope, 
and clearly maximizing that linear function over the elliptope is a convex relaxation of the 
max cut problem. Cut matrices are extreme points of the elliptope, but having rank one is 
only sufficient, not necessary, for extreme points. A comprehensive recent paper on the rank 
and structure of the extreme points of the elliptope, which also reviews previous results on 
the topic, is Li and Tam (1994). 

ft was first shown by Grone, Pierce, and Watkins (1990) that the elliptope £ nxn contains 
an extreme point of rank r if and only if r(r + 1) < 2 n. The face structure of the elliptope 
was analyzed in Laurent and Poljak (1996), and necessary and sufficient conditions for a 
correlation matrix of rank r to be an extreme point are given in Ycart (1985), Li and Tam 
(1994), Parthasarathy (2002), and Hurlimann (2015). 

5 Algorithm 

Although we have a choice of many different SDP algorithms, we use a special purpose method 
to solve MTMB. It seems to work well. We maximize tr U'UW by cycling over the columns 
of U, changing one of them at a time, requiring throughout that lij-u, = 1 . Clearly the solution 
for column tq, keeping all other column at their current value, is given by v = Y,k^i w ij u k 
and then Uj = n/||n||. The appendix has a simple R function maxR() to do exactly this 

If we have found the solution, we use R(D — W) = 0 to find the solution of the primal 
problem. 

5.1 Examples 

We give some example of application of our dual algorithm. Our first matrix W is 


## 


[, 1 ] 

[, 2 ] 

C , 3 ] 

[, 4 ] 

[, 5] 

[ , 6 ] 

## 

[1J 

+1 

.0 

-1 

.0 

+0. 

,0 

+0. 

,0 

+0. 

,0 

+0 

.0 

## 

[ 2 ,] 

-1 

.0 

+2 

.0 

- 1 . 

,0 

+0. 

,0 

+0. 

,0 

+0 

.0 

## 

[ 3 ,] 

+0 

.0 

-1 

.0 

+ 2 . 

,0 

- 1 . 

,0 

+0. 

,0 

+0 

.0 

## 

[ 4 ,] 

+0 

.0 

+0 

.0 

- 1 . 

,0 

+ 2 . 

,0 

- 1 . 

,0 

+0 

.0 

## 

[5,] 

+0 

.0 

+0 

.0 

+0. 

,0 

- 1 . 

,0 

+ 2 . 

,0 

-1 

.0 

## 

[ 6 ,] 

+0 

.0 

+0 

.0 

+0. 

,0 

+0. 

,0 

- 1 . 

,0 

+ 1 

.0 
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This has largest eigenvalue 3.7320508076, and thus the best scalar D has trace 22.3923048454. 
Using the trace of W for a scalar D gives a trace of D of 60. The iterations of our algorithm, 


which always start with R 

= /, are 



## 

itel 


1 

fold 

10.000000 

fnew 

17.656854 

## 

itel 


2 

fold 

17.656854 

fnew 

19.431440 

## 

itel 


3 

fold 

19.431440 

fnew 

19.804726 

## 

itel 


4 

fold 

19.804726 

fnew 

19.914796 

## 

itel 


5 

fold 

19.914796 

fnew 

19.962473 

## 

itel 


6 

fold 

19.962473 

fnew 

19.983844 

## 

itel 


7 

fold 

19.983844 

fnew 

19.993073 

## 

itel 


8 

fold 

19.993073 

fnew 

19.997032 

## 

itel 


9 

fold 

19.997032 

fnew 

19.998729 

## 

itel 


10 

fold 

19.998729 

fnew 

19.999455 

## 

itel 


11 

fold 

19.999455 

fnew 

19.999767 

## 

itel 


12 

fold 

19.999767 

fnew 

19.999900 

## 

itel 


13 

fold 

19.999900 

fnew 

19.999957 

## 

itel 


14 

fold 

19.999957 

fnew 

19.999982 

## 

itel 


15 

fold 

19.999982 

fnew 

19.999992 

## 

itel 


16 

fold 

19.999992 

fnew 

19.999997 

## 

itel 


17 

fold 

19.999997 

fnew 

19.999999 

## 

itel 


18 

fold 

19.999999 

fnew 

19.999999 

The optimum R is a cut matrix, given as 


## 


[, 1 ] 

[,2] [,3] 

[,4] [, 5] 

[, 6 ] 


## 

[1J 

+ 1.0 

-1. 

,0 + 1.0 

- 1.0 + 1.0 - 

- 1.0 


## 

[ 2 ,] 

- 1.0 

+1 . 

,0 - 1.0 

+ 1.0 - 1.0 + 1.0 


## 

[3,] 

+ 1.0 

-1. 

,0 + 1.0 

- 1.0 + 1.0 - 

- 1.0 


## 

[4,] 

- 1.0 

+1 . 

,0 - 1.0 

+ 1.0 - 1.0 + 1.0 


## 

[5,] 

+ 1.0 

-1. 

,0 + 1.0 

- 1.0 + 1.0 - 

- 1.0 


## 

[ 6 ,] 

- 1.0 

+ 1 . 

,0 - 1.0 

+ 1.0 - 1.0 + 1.0 



and the diagonal of the corresponding D is 

## [1] 2.000000 4.000000 4.000000 4.000000 4.000000 2.000000 

This is close to the best scalar D, because in this first example all eigenvalues of W are close. 

In the second example we take W of order 6 , but of rank one. 


## 


[, 1 ] 

[, 2 ] 

[, 3] 

[ >4] 

[, 5] 

[, 6 ] 

## 

[ 1 J 

+ 0.0 

+ 0.1 

+ 0.1 

+ 0.2 

+ 0.2 

+ 0.2 

## 

[ 2 ,] 

+ 0.1 

+ 0.2 

+ 0.2 

+0.3 

+0.4 

+0.5 

## 

[3,] 

+ 0.1 

+ 0.2 

+0.4 

+0.5 

+ 0.6 

+0.7 

## 

[4,] 

+ 0.2 

+0.3 

+0.5 

+ 0.6 

+ 0.8 

+ 1.0 

## 

[5,] 

+ 0.2 

+0.4 

+ 0.6 

+ 0.8 

+ 1.0 

+ 1.2 

## 

[ 6 ,] 

+ 0.2 

+0.5 

+0.7 

+ 1.0 

+ 1.2 

+ 1.4 


7 



Its largest eigenvalue (and its trace) is 3.64. The iterations are 


## itel 

1 

fold 

3.640000 

fnew 

17.132389 


## itel 

2 

fold 

17.132389 

fnew 

17.633437 


## itel 

3 

fold 

17.633437 

fnew 

17.639916 


## itel 

4 

fold 

17.639916 

fnew 

17.639999 


## itel 

5 

fold 

17.639999 

fnew 

17.640000 


The cut matrix 

we find is 





## C, 1 ] 

[, 2 ] 

[ , 3 ] 

[ ,4] [,5] 

[, 6 ] 



## [ 1 ,] + 1.0 

+ 1.0 

+ 1.0 

+ 1.0 + 1.0 

+ 1.0 



## [ 2 ,] + 1.0 

+ 1.0 

+ 1.0 

+ 1.0 + 1.0 

+ 1.0 



## [3,] +1.0 

+ 1.0 

+ 1.0 

+ 1.0 + 1.0 

+ 1.0 



## [4,] +1.0 

+ 1.0 

+ 1.0 

+ 1.0 + 1.0 

+ 1.0 



## [5,] +1.0 

+ 1.0 

+ 1.0 

+ 1.0 + 1.0 

+ 1.0 



## [ 6 ,] + 1.0 

+ 1.0 

+ 1.0 

+ 1.0 + 1.0 

+ 1.0 



which corresponds with D equal to 




## [1] 0.840000 1 

.680000 2.520000 

3.360000 4.200000 5 

.040000 

6 Majorization Examples 



6.1 Monotone Regression 




Consider the problem of minimizing (y - 

- x)'W (■ y — x) over all x 

satisfying X\ < ■ ■ ■ < x n . If 

W is diagonal, this problem can be solved quickly (in linear time) with the usual monotone 
regression algorithms, and thus we can use majorization to reduce the problem to a sequence 

of such monotone regressions. 




In our example 

we use W equal to 




## [, 1 ] 

[ , 2] [ , 3] 

[ , 4] [, 5] 

[ , 6] 

[ ,7] [ , 8 ] [ , 9] 

[, 10 ] 

## [ 1 ,] 1 

1 

1 

1 1 

1 

1 1 1 

1 

## [ 2 ,] 1 

2 

2 

2 2 

2 

2 2 2 

2 

## [3,] 1 

2 

3 

3 3 

3 

3 3 3 

3 

## [4,] 1 

2 

3 

4 4 

4 

4 4 4 

4 

## [5,] 1 

2 

3 

4 5 

5 

5 5 5 

5 

## [ 6 ,] 1 

2 

3 

4 5 

6 

6 6 6 

6 

## [7,] 1 

2 

3 

4 5 

6 

7 7 7 

7 

## [ 8 ,] 1 

2 

3 

4 5 

6 

7 8 8 

8 

## [9,] 1 

2 

3 

4 5 

6 

7 8 9 

9 

## [ 10 ,] 1 

2 

3 

4 5 

6 

7 8 9 

10 

and y equal to 







## [1] 1 3 

; 2 

3 3 

1 1 4 

4 1 
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Matrix W has a largest eigenvalue equal to 44.7660686527 and a trace equal to 55. The 
trace-optimal D is 

## [1] 10 19 27 34 40 45 49 52 54 55 

We make three runs of the majorization version jbkWPavaO, which uses jbkPavaO from 
De Leeuw (2017) for the diagonal monotone regression. Using a scalar matrix with the 
trace requires 355 iterations, using the largest eigenvalue uses 296 iterations, and using the 
trace-optimal D uses 113 iterations. Of course this does not address the question if the 
gain in the number of iterations offsets the extra computation needed for either the largest 
eigenvalue or the trace-optimal D. 

6.2 Multidimensional Scaling 

Suppose A r with r = 1 are independent and identically distributed symmetric and 
hollow dissimilarity matrices of order n. Define for i < j 

ijri 


l 


R 


-h- r=1 


and for i < j and k < l 


R 


IRijkf. ^ ) ($ijr fiij ) ( tikf/r &ki) 

-h- r=1 


Also define the inverse V of W = {wijke}, and now minimize the stress loss function 


cr(X) — EE EE Vijw(Sij — dij(X))(5ki — dkt(X)). 


The minimum of stress is distributed as chi-square with \n{n — 1) — (pn — \p{p + 1)) degrees 
of freedom. Bounding the matrix V by a diagonal matrix with elements e t] allows us to use 
the usual smacof algorithm (De Leeuw and Mair (2009)). 

More precisely minimizing the majorization function amounts to minimizing over configura¬ 
tions X the loss 


<r(A', V‘>) = ££e«(d«(X) - (dy(X«) + r # (I»))f, 

i<j 

where 

r«(A'W) = -X£ w ijU (6ij -d«(A«‘>)). 
e h k<l 

In an actual program we will perform a number of smacof inner iterations to update X^ 
without insisting on convergence to define X^ k+l \ In fact, a single inner iteration may suffice. 

This approach can be extended to group difference models, in which each group has a different 
set of distances, and all groups are large. Actual examples of this approach are a bit hard to 
come by, because usually existing data sets are already averaged over individuals, or they are 
too small to allow for an invertible covariance matrix of the dissimilarities. 
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6.3 Structural Equation Models 

The generalized least squares loss function in multinomial multivariate analysis is 

<j{B) = tr S~\S - E(0))5 _1 (5 - £(0)). 

If we define r(6) := vec(S — £(#)) then a(6) = r(6)'(S~ 1 0 S~ 1 )r(9). If D > S then 
D 0 D > S' -1 0 S' -1 , and of course d 0 Tis diagonal. And if D is scalar, so is D 0 D. Thus 
we can majorize with a weighted sum or, less precisely, with an unweighted sum of squares. 

7 Appendix: Code 

7.1 maxR 

maxR <- function (w, 

eps = le-6, 
itmax = 100, 
verbose = FALSE) { 

n <- nrow (w) 
x <- diag (n) 
r <- diag (n) 
itel <- 1 

fold <- sum (diag (w)) 
repeat { 

for (i in l:n) { 

y <- x °/o*°/o w[, i] - w[i, i] * x[, i] 

x [, i] <- y / sqrt (sum (y ~ 2)) 

r [, i] <- r[i, ] <- x[, i] 1*1 x 

> 

fnew <- sum (r * w) 
if (verbose) 
cat ( 

"itel ", 
formate ( 
itel, 

digits = 4, 
width = 6, 
format = "d" 

), 

" fold ", 
formate ( 
fold, 

digits = 6, 
width = 10, 
format = "f" 
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), 

" fnew " , 
formate ( 
fnew, 

digits = 6, 
width = 10, 
format = "f" 

), 

"\n" 

) 

if (((fnew - fold) < eps) I I (itel == itmax)) 
break 

itel <- itel + 1 
fold <- fnew 

} 

d <- (r y.*y. w) / r 

return (list (r = r, d = d[l, ], itel = itel)) 


7.2 jbkWPava 

dyn.load(" jbkPava. so") 

jbkPava <- function (x, w = rep(l, length(x))) { 
h <- 

.C( "jbkPava" , 

x = as.double (x), 
w = as.double (w), 
n = as.integer(length (x))) 
return (h$x) 

> 

jbkWPava <- 
function (y, 
w, 
d, 

eps = le-6, 
itmax = 100, 
verbose = FALSE) { 
n <- length (d) 
h <- (w %*% y) / d 
xold <- l:n 
itel <- 1 
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fold <- sum (y * (w 1*1 y)) 
repeat { 

g <- h + xold - (w 1*1 xold) / d 
xnew <- jbkPava (g, d) 

fnew <- sum ((xnew - y) * (w 1*1 (xnew - y))) 
if (verbose) { 
cat ( 

"itel ", 

f ormatC(itel , width = 3, format = "d"), 

" fold ", 
formate ( 
fold, 

width = 8, 
digits = 4, 
format = "f" 

), 

" fnew " , 
formate ( 
fnew, 

width = 8, 
digits = 4, 
format = "f" 

), 

"\n" 

) 

} 

if (((fold - fnew) < eps) I I (itel == itmax)) 
break 

fold <- fnew 
xold <- xnew 
itel <- itel + 1 

> 

return (list (x = xnew, itel = itel, f = fold)) 

} 
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